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Abstract 



^ , Soft-input soft-output (SISO) detection algorithms form the basis for iterative decoding. The compu- 

tational complexity of SISO detection often poses significant challenges for practical receiver implemen- 
tations, in particular in the context of multiple-input multiple-output (MIMO) wireless communication 
systems. In this paper, we present a low-complexity SISO sphere-decoding algorithm, based on the 
, single tree-search paradigm proposed originally for soft-output MIMO detection in Studer, et al., 

IEEE J-SAC, 2008. The new algorithm incorporates clipping of the extrinsic log-likelihood ratios 
(LLRs) into the tree-search, which results in significant complexity savings and allows to cover a 

> ; 

I large performance/complexity tradeoff region by adjusting a single parameter. Furthermore, we propose 

, a new method for correcting approximate LLRs — resulting from sub-optimal detectors — which (often 



significantly) improves detection performance at low additional computational complexity. 
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I. Introduction 

Soft-input soft-output (SISO) detection constitutes the basis for iterative decoding in multiple- 
input multiple-output (MIMO) systems, which, in general, achieves significantly better (error- 
rate) performance than decoding based on hard-output or soft-output-only detection algorithms [1]. 
Unfortunately, this performance gain comes at the cost of a significant (often prohibitive in terms 
of practical implementation) increase in computational complexity. 

Various SISO detection algorithms for MIMO systems offering different perfor- 
mance/complexity tradeoffs have been proposed in the literature, see e.g., [l]-[6]. However, 
implementing different algorithms, each optimized for a maximum allowed detection effort or 
for a particular system configuration, would entail considerable circuit complexity. A practical 
SISO detector for MIMO systems should therefore cover a wide range of performance/complexity 
tradeoffs and be easily adjustable through a single tunable detection algorithm. 

Soft-output single tree-search (STS) sphere decoding (SD) in combination with log-likelihood 
ratio (LLR) clipping [7] has been demonstrated to be suitable for VLSI implementation and 
allows to conveniently tune detection performance between maximum-likelihood (ML) a pos- 
teriori probability (APP) soft-output detection and (low-complexity) hard-output detection. The 
STS-SD concept is therefore a promising basis for efficient SISO detection in MIMO systems. 

Contributions: We describe a SISO STS-SD algorithm that is tunable between max-log optimal 
SISO and hard-output maximum a posteriori (MAP) detection performance. To this end, we 
extend the soft-output STS-SD algorithm introduced in [7], [8] by a max-log optimal a priori 
information processing method, which significantly reduces the tree-search complexity compared 
to, e.g., [3], [5], [6], [9], [10], and avoids the computation of transcendental functions. The basic 
idea for complexity reduction and to achieve tunability of the algorithm is to incorporate clipping 
of the extrinsic LLRs into the tree search. This requires that the list administration concept and 
the tree-pruning criterion proposed for soft-output STS-SD in [7] be suitably modified. We 
furthermore propose a method for compensation of self-interference in the LLRs — caused by 
channel-matrix regularization — directly in the tree search. In addition, we describe a new method 
for correcting approximate LLRs — resulting from sub-optimal detectors — which (often signifi- 
cantly) improves detection performance at low additional computational complexity. Simulation 
results show that the resulting SISO STS-SD algorithm operates close to outage capacity at 
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remarkably low computational complexity. In addition, the algorithm offers a significantly larger 
performance/complexity tradeoff region than the soft-output STS-SD algorithm proposed in [7]. 

Notation: Matrices are set in boldface capital letters, vectors in boldface lowercase letters. The 
superscripts ^ and ^ stand for transpose and conjugate transpose, respectively. We write Aij 
for the entry in the ith row and jth column of the matrix A and hi for the ith entry of the 
vector b = [61 ■ ■ ■ hj^Y . The £^-norm of the vector b is denoted by ||b||. In and Omxn refer 
to the N X N identity matrix and the M x N all-zero matrix, respectively. Slightly abusing 
common terminology, we call an M x matrix A, where M > N, satisfying A^A = In, 
unitary. \0\ denotes the cardinality of the set O. The probability of an event Z is referred to as 
P[Z], the probability density function of a continuous random variable (RV) z is denoted by p(z) 
and E[Z] stands for the expectation of the RV Z. x is the binary complement of x G {+1, — 1}, 
i.e., X = —X. 

Outline: The remainder of this paper is organized as follows. Section |ll] reviews the transfor- 
mation of soft-input soft-output MIMO detection into a tree-search problem and presents new 
methods for tightening of the tree-pruning criterion and for incorporating a priori information 
into the tree search. Section UlI] describes the new SISO STS-SD algorithm. In Section |IVl we 
propose a method for compensating the impact of channel-matrix regularization on LLRs directly 
in the tree search. A new technique for computationally efficient correction of approximate 
LLRs — resulting from the max-log approximation, channel-matrix regularization, and early ter- 
mination [7], [11] — is presented in Section |Vl Simulation results are provided in Section IVTl 
We conclude in Section IVII[ 

II. Soft-Input Soft-Output Sphere Decoding 

Consider a MIMO system with Mt transmit and Mr > Mt receive antennas. The coded 
bit-stream to be transmitted is mapped to (a sequence of) Mx-dimensional transmit symbol 
vectors s G O^^'^, where O stands for the underlying complex scalar constellatiorQ and \0\ = 2^. 
Each symbol vector s is associated with a label vector x containing M^Q binary values chosen 
from the set {+1,-1} where the null element (0 in binary logic) of GF(2) corresponds to +L 

'The algorithm developed in this paper can also be formulated for the case where different constellations are used on different 
transmit antennas. However, for the sake of simplicity of exposition, we restrict ourselves to employing the same constellation 
on all transmit antennas. 
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The corresponding bits are denoted by Xi,b, where the indices i and b refer to the 6th bit in the 



binary label of the ith entry of the symbol vector s = 
baseband input-output relation is given by 

y = Hs + n 



Sl 



SMrr 



IT 



. The associated complex 



(1) 



where H stands for the Mr x Mt channel matrix, y is the M^-dimensional received signal 
vector, and n is an i.i.d. circularly symmetric complex Gaussian distributed MR-dimensional 
noise vector with variance No per complex entry. Different transmit powers on the individual 
transmit antennas are assumed to be absorbed in the channel matrix H, which — including the 
corresponding scaling factors — will be referred to as the physical MIMO channel. Throughout 
the paper, we consider coherent detection, i.e., the receiver knows the realization of the channel 
matrix H perfectly. 



A. Max-Log LLR Computation as a Tree Search 

Coherent SISO detection for MIMO systems requires computation of the LLRs [1] 



log 



^[Xi,b = +1 


y, 


H] 


P[x^,b = -1 


y, 


H] 



(2) 



for all bits i = 1, . . . , Mt, 6 = 1, . . . , Q, in the label x. Bayes's theorem applied to Q leads to 
the equivalent formulation 



Li,b = log 



J2 p(y|s,H)P[s] 



log 



\ 



J2 p(y|s,H)P[s] 



(-1) 

i.b 



(3) 



/ 



where X^^^^^ and Af/j, are the sets of symbol vectors that have the bit corresponding to the 
indices i and b equal to +1 and —1, respectively, P[s] corresponds to the prior, and 

Straightforward evaluation of ([3]) requires the computation of Euclidean distances per 

LLR value, which, in general, leads to prohibitive computational complexity. We therefore 
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employ the standard max-log approximatiorO on ([3]), which enables us to reformulate the LLR 
computation problem as a weighted tree-search problem that can be solved efficiently using the 
SD algorithm [7], [8], [13]-[20]. To this end, the channel matrix H is first QR-decomposed 
according to H = QR, where the Mr x Mp matrix Q is unitary and the Mt x Mt upper- 
triangular matrix R has real-valued positive entries on its main diagonal. Left-multiplying ([T]) 
by leads to the modified input-output relation 

y = Rs + Q^n (4) 

where y = Q^y and Q^n is also i.i.d. circularly symmetric complex Gaussian with variance 
No per complex entry. In the following, we consider an iterative MIMO decoder as depicted 
in Fig. [B The soft-input soft-output MIMO detector computes intrinsic max-log LLRs according 
to [1] 

L?b- min |-^||y-Rs||^-logP[s] 



i.b 



- min <'^||y-Rs||2-logP[s] }, (5) 

.(+1) 



i,b 



where the prior P[s] is, e.g., delivered by an outer channel decoder in the form of a priori LLRs 

Based on the intrinsic LLRs in (|5]), the detector computes the extrinsic LLRs 

Ll,^Ll,-Lf^,, yz,b, (6) 

that are passed to a subsequent SISO channel decoder. Note that we neglected the additive 
constant in each of the two minima in ^ that results from the part of the noise n that is 
orthogonal to the range space of H. This is possible as the constant in question is independent 
of s and, hence, cancels out upon taking the difference in ([5]). 
For each bit, one of the two minima in ([5]) corresponds to 



^MAP A 



1 

Nr 



2 



logP[s^^T (7) 



^The max-log approximation corresponds to log (X]fc6xp(afc)) ~ maxfc{afc} and entails a performance loss compared to 
using the exact LLRs in ([3}. As shown in [12], this loss is small, in general. 
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which is associated with the MAP solution of the MIMO detection problem 

s^^AP = argmin|— ||y-Rs||^-logP[s] I. (8) 
The other minimum in ([5]) can be computed as 

Ajr= mii^ l^||y-Rs|f-logP[s]| (9) 

(^MAP) L^^O J 



where x^^^ denotes the (bit- wise) counter-hypothesis to the MAP hypothesis. With the defi- 
nitions (|7]) and ©J the intrinsic max-log LLRs in ([5]) can be written (Vi, 6) in compact form 
as 



\MAP \MAP ™MAP _ ,1 

lP,= <| (10) 

' \MAP \MAP ^MAP _ 1 

We can therefore conclude that efficient max-log-optimal soft-input soft-output MIMO detection 



reduces to efficiently identifying s^^^, A^^^, and }^^^ (Vz, h). 

We next define the partial symbol vectors (PSVs) s*^*) = \si ■■■ sm^Y ^'^^ '^'^^^ ^^^^ ^^^Y 
can be arranged in a tree that has its root just above level % = Mr and leaves, on level i = 1, 
which correspond to symbol vectors s. The binary-valued label vector associated with s^*) will 
be denoted by x^*) . The distances 

^(s) = i^l|y-R-s||'-iogP[s] (11) 



o 



in (|7]) and ^ can be computed recursively if the following factorization holds: 

Mt 

P[s]= J]P[s«], (12) 

1=1 

which is assumed from now on. Note that in practice, the symbols Sj (? = 1, . . . , Mt) are often 
statistically independent across spatial streams; this satisfies (fT2l) trivially with P[s] = Y[i=i ^[^i]- 
We can now rewrite (fTTI) as 



Mt / Mt 2 

i=l \ ° j=i 



which can be evaluated recursively as d{s) = di, with the partial distances (PDs) 

di = di+i + \ei\, i = MT,...,l, 
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the initialization c/m^+i = 0, and the distance increments (DIs) 

2 

-logP[s(^)]. (13) 

Note that the DIs are non-negative since the prior terms satisfy — log P [s^*)] > 0. The dependence 
of the PD di on the symbol vector s is, thanks to the upper triangularity of R and the assump- 
tion (fT2)) . only through the PSV s*^*\ Thus, the MAP detection problem and the computation of 
the intrinsic max-log LLRs has been transformed into a tree-search problem: PSVs and PDs are 
associated with nodes, branches correspond to DIs. For brevity, we shall often say "the node s*^*)" 
to refer to the node corresponding to the PSV s*^*\ We shall furthermore use (i(s*^*^) and (i(x*^*^) 
interchangeably to denote di. Each path from the root node down to a leaf node corresponds to 
a symbol vector s G (9^^^ xj^g result in (|7]) and ^ corresponds to the leaf associated with the 
smallest metric in (9*^^ ^j^^j » / ^ respectively. The SISO STS-SD algorithm uses elements 
of Schnorr-Euchner SD (SESD) [15], [21], briefly summarized as follows: The search in the 
weighted tree is constrained to nodes which lie within a radiu r around y and tree traversal is 
performed depth-first, visiting the children of a given node in ascending order of their PDs. A 
node s*^*) with PD d^ can be pruned (along with the entire subtree originating from this node) 
whenever the tree-pruning criterion 

d^ > (14) 

is satisfied. In the remainder of this paper, ([141) is referred to as the "standard pruning criterion." 

The radius r has to be chosen sufficiently large such that the SD algorithm finds at least the 
MAP solution. Choosing r too large, leads to high complexity as a large number of nodes do 
not satisfy the pruning criterion. In order to avoid the problem of choosing a suitable radius r 
altogether, we employ a technique known as radius reduction [21], which consists of initializing 
the algorithm with r = oo, and performing the update <— d(s) whenever a valid leaf node s 
has been found. 

The complexity measure used in the remainder of the paper corresponds to the total number 
of nodes visited by the detector, including the leaf nodes, but excluding the root node. Note 
that this measure was shown in [22] to be representative of the hardware complexity of a VLSI 
implementation of hard-output SESD. 

^Note that r corresponds to the radius of a hypersphere if the prior satisfies P[s] = 0. 



e,; 



Eft. 



1=1 
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B. Tightening of the Tree-Pruning Criterion 

Tightening of the tree-pruning criterion (fT4l) . i.e., reduction of the right-hand side (RHS) 
of (fT4l) . without sacrificing (max-log) optimality is highly desirable as it reduces the (tree-search) 
complexity. Such a reduction can be accomplished, for example, through techniques based on 
semi-definite relaxation and -estimation theory as proposed in [23]. Unfortunately, these 
approaches entail, in general, a high computational complexity and are, hence, not well-suited 
for practical (VLSI) implementation. 

In the following, we propose an alternative approach which relies on the observation that the 
DIs (fT3l) contain a — generally non-zero — bias given by 

\bi\= min |ej| , i = l,...,MT. (15) 

Consider the case where the detector stands at node s*^*) on level i with corresponding PD di. 
All leaf-level PDs di that can be reached from the node s*^*^ satisfy 

i-l 

di>di + ^\bj\. (16) 

i=i 

At level i, we can therefore prune every node that satisfies a tightened version of the tree-pruning 
criterion in (fT4l) . namely 

i-l 

di>r'^-^\hj\. (17) 

Computation of the bias term (fTSi) requires enumeration of |ei| over all s*^*) G C)*^t+i-«^ which, 
in general, leads to prohibitive computational complexity. The major portion of this complexity 
is caused by the computation of the Euclidean distance-term ■^\yi — 'Yl!j=i-^i,j^j^ (fT3l) . 
whose contribution to the bias (fT5l) . as it turns out (corresponding simulation results are shown 
in Section IVI-All) . is negligible. Hence, we only consider the contribution to \hi\ caused by the 
prior term — logP^s*^*)] and we define accordingly 

\pi\= min I -logPTs^^)] |. (18) 

The corresponding tightened tree-pruning criterion is then given by 

d,>r''-Y,\Pj\- (19) 

i=i 
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For the case of the individual symbols Sj (i = 1, . . . , Mp) being statistically independent, i.e., 
— nfii-Pl-^i]' have \pi\ = min^-gci { — logP[sj] }, so that computation of the RHS 
of (fTSi) results in significantly smaller complexity than that required to compute the RHS of (fT5l) . 

We emphasize that using the tightened tree-pruning criterion (fT9l) preserves max-log optimality 
and leads, in general, to significant complexity savings, when compared to the standard pruning 
criterion (fT4l) . which is widely adopted in the literature [3], [5], [6], [9], [10]. To see this, consider 
the case where all constellation points are equally likeljo, i.e., P[si] = \0\^^ for all Sj G (9 
and i = 1, . . . , M^- The corresponding total bias from level i down to the leaf level is given 
X]j=i \Pj\ = ~ 1) log 1^1' which can be large, especially for nodes close to the root. Since 
pruning at and close to the root level, has, in general, significant impact on the number of nodes 
visited in the tree search, the tightened tree-pruning criterion (fT9l) can lead to a major complexity 
reduction. Corresponding simulation results are provided in Section IVI-A2[ 

C. Tree Search in the Case of Statistically Independent Bits 

We have seen above that statistical independence among individual symbols enables us to 
tighten the tree-pruning criterion at low additional computational complexity. For bit-interleaved 
coded modulation [24], in addition the bits (i = 1, . . . , Mr, h = 1, . . . , Q) are statistically 
independent. As shown next, this independence on the bit-level can be exploited to get further 
reductions in computational complexity. To see this, consider the case where the MIMO detector 
obtains a priori LLRs Lf^^ (Vz, h) from an external device, e.g., a SISO channel decoder as depicted 
in Fig. ill We then have [25] 

which can be reformulated in more compact form as 

The contribution of the a priori LLRs to the prior term in the DIs in (fT3]) can then be obtained 



from (1201) as 

Q 



-\ogV[s,]=ki-Y,\^r,hLtb (21) 



2 

b=i 

''This, for example, is the case when no a priori information is available and all transmitted bits are equally likely. 
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where the constants 

= E + (l + exp (- j (22) 

6=1 ^ ' 

are independent of the binary-valued variables Xi^h and > for i = 1, . . . , Mt. Because of 
-logP[sj] > 0, we can trivially infer from dH]) that K., - YS,^\ ^^i,bLf[b > 0- From ^ it 
follows that constant terms (i.e., terms that are independent of the variables Xi^ and hence of s) 
in ^ and Q cancel out in the computation of the intrinsic LLRs Lf^ (Vi, b) and can therefore 
be neglected. A straightforward method to avoid the hardware-inefficient task of computing 
transcendental functions in (l22l) is to set i^j = in the computation of (|2TI) . This can, however, 
lead to branch metrics that are not necessarily non-negative, which would inhibit pruning of the 
search tree. On the other hand, modifying the DIs in (fT3]) by setting 

Mt 

Hi — ^ij^j 



K^-Y1 h^^^^t (23) 



2 

b=l 



with Ki = Yl^iW^tbl avoids computing transcendental functions while guaranteeing 
that, thanks to [L^^I — Xi^hLfb > (yi,b), the so obtained branch metrics are non-negative. 
Furthermore, as > K^, using the modified DIs (often significantly) reduces the (tree-search) 
complexity compared to that implied by (fT3l) using ((2T)) and, thanks to (fTOl) . still yields max- 
log-optimal LLRs. The reason for complexity reduction when using the modified DIs (|23l) lies 
in the modified prior term being bias-free, i.e.. 



f ^1 

min < Ki — y -; 

I. 6=1 



min<(K,->^^x,,bL^J =0, Vz, (24) 



which directly leads to tight tree pruning using the standard pruning criterion in (fT4l) and hence, 
avoids explicit evaluation of (fTSl) . 

Note that in [5, Eq. 9], the prior term (|2T1) was approximated as 

Q . 

-logP[s.]^^-(|Lj,|-x,,L^,) 

6=1 

for \Lf"^\ > 2 (b = 1, . . . ,Q) which corresponds exactly to what was done here in order to 
arrive at (l23l) . It is important, though, to realize that using the modified DIs (l23l) does not lead 
to an approximation of (flOl) . as only differences are considered in the intrinsic max-log LLR 
computation and the neglected log(-)-term does not depend on Xi^h. 
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III. Extrinsic LLR Computation in a Single Tree Search 
Computing the intrinsic max-log LLRs in (fTOl ) requires to determine A^^^ and the met- 



rics \f\f^^ associated with the counter-hypotheses. For given i and b, is obtained by 

traversing only those parts of the search tree that have leaves in X.\ '''' ' . The quantities A^^^ 



and X^ff^^ can, in principle, be computed using the sphere decoder based on the repeated 
tree-search (RTS) approach described in [19]. The RTS strategy results, however, in redundant 
computations as (often significant) parts of the search tree are revisited during the RTS steps 



required to determine Af|/^^ for all i, b. Following the STS paradigm described for soft-output 
SD in [7], we note that efficient computation of Lf^^ (Wi, b) requires that every node in the 
tree be visited at most once. This can be achieved by searching for the MAP solution and 



computing the metrics A^^,"^^ (Vi, b) concurrently while ensuring that the subtree originating 
from a given node in the tree is pruned if searching that subtree can not lead to an update of 



either A^"^^ or at least one of the Af|/^^. Besides extending the ideas in [7] to take into account a 
priori information, the main idea underlying SISO STS-SD presented in this paper is to directly 
compute the extrinsic LLRs Lf^ through a tree search, rather than computing Lf^, first and then 
evaluating Lf^^ = Lf^, - Lf;^ (Vi, b). 

Due to the large dynamic range of LLRs, fixed-point detector implementations need to con- 
strain the magnitude of the LLR values. Evidently, clipping of the LLR magnitude leads to a 
performance degradation in terms of error rate. It was noted in [7], [26] that incorporating LLR 
clipping into the tree search is very effective in terms of reducing the complexity of max-log 
soft-output SD. In addition, as demonstrated in [7], LLR clipping (when built into the tree search) 
also allows to tune the MIMO detection algorithm in terms of complexity versus performance by 
adjusting the clipping parameter. In the SISO case, we are ultimately interested in the extrinsic 
LLRs Lffj and clipping should therefore ensure that \Lff^\ < Lmax (V2,6), where Lmax is the 
LLR clipping parameter. It is therefore sensible to ask whether clipping of the extrinsic LLRs 
can be built directly into the tree search. The answer is in the affirmative and the corresponding 
solution is described below. We start by writing the extrinsic LLRs as 



Lf,= { (25) 

' \MAP _ A MAP ™MAP _ _i 
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where the quantities 



{\MAP _ rA ™MAP _ , i 
-+1 (26) 
\MAP I rA ™MAP _ i 

will be referred to as the extrinsic metrics. For the following developments it will be convenient 
to define the function /(■) that transforms an intrinsic metric A with associated a priori LLR 
and binary label x to an extrinsic metric A according to 

A = f{\,L^,x) = I ' (27) 

[ \ + L^, x = -l. 

With this notation, we can rewrite (|26l) more compactly as Af^ff^^ = f (^^ffi^^ , Lf-^b^ xfjf^^^ . The 
inverse function of (|271) transforms an extrinsic metric A to an intrinsic metric A and is given 
by 

A X I A + L^, X = +1 
\ = f-'{A,L^,x) = { ' (28) 

[ A-L^, x = -l. 

We emphasize that the tree-search algorithm described in the following produces the extrinsic 
LLRs Lfi^ (Wi, h) in ([25]) rather than the intrinsic ones in ([JOl). Since the soft-output STS-SD 
algorithm described in [7] delivers Lf^ and Lf^ = Lf^ only in the soft-output case (i.e., if 
Lf^ = 0, Wi, b), careful modification of the list administration steps, the tree-pruning criterion, 
and the LLR clipping rules, of the soft-output STS-SD algorithm, is needed. 

A. List Administration 

The main idea of the SISO STS paradigm is to search the subtree originating from a given 



node only if the result can lead to an update of either A^^^ or of at least one of the A^^^^^^. To this 
end, the SD algorithm needs to maintain a list containing the current MAP hypothesis x'^'^^, the 



corresponding metric A^^^, and all QM^ extrinsic metrics Af\f^^. The algorithm is initialized 



with A^"^^ = A^^^"^^ = oo and xf^ff^^ = 1 (Vi, b). Whenever a leaf node with corresponding label 
X has been reached, the detector distinguishes between two cases: 

i) MAP hypothesis update: If (i(x) < A^^^, a new MAP hypothesis has been found. First, 



all extrinsic metrics A^^^ for which Xj^^ = xf^^ are updated according to 

aMAP^ / /\MAP rA ^MAP^ 
^H,b ^ J I ^ 1 J^i,b-> ^i,b J 
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followed by the updates A^^^ ^ c/(x) and x'^'^^ ^ x. In other words, for each bit in the MAP 
hypothesis that is changed in the update process, the metric associated with the former MAP 
hypothesis becomes the extrinsic metric of the new counter-hypothesis. 

ii) Extrinsic metric update: In the case where (i(x) > A'^'^^, only extrinsic metrics cor- 
responding to counter-hypotheses might be updated. For each i = 1, . . . , Mt, b = 1, . . . ,Q 



with Xi,b = x^^^^ and /(d(x), L^^, x^ap^ < ^map^ SjSq stS-SD algorithm performs the 
update 



AT^f{d{^),Lt,xf:n- (29) 



B. Extrinsic LLR Clipping 

In order to ensure that the extrinsic LLRs delivered by the algorithm indeed satisfy 
\Lfb\ — -^max (Vi,6), the following update rule 

AjfI^^min{A;fI^,A^^P + L^ax}, Vz, 6 (30) 

has to be applied after carrying out the steps in Case i) of the list administration procedure 
described in Section ITlI-A[ Fig. [2] illustrates the principle of extrinsic LLR clipping. The search 
for counter-hypotheses associated with extrinsic metrics is constrained to a hypersphere of 
radius r = V A^^^ -|- Lmax around the (transformed) received signal vector y. In Section IVI-B 1 [ 
it will be demonstrated numerically that incorporating the constraint < Lmax directly into 
the tree search significantly reduces complexity. We emphasize that for Lmax = oo the detector 
attains max-log optimal SISO performance, whereas for L„ia.x = 0, the LLRs satisfy Lf^ = 
and the hard-output MAP solution ([8]) is obtained. 

C. The Tree-Pruning Criterion 

Consider the node s^*^ on level i corresponding to the label bits Xj^b (j = i, ■ ■ ■ , M^, 
b = 1, . . . ,Q). Assume that the subtree originating from this node and corresponding to the label 
bits Xj^b (j = 1, . . . , z — 1, b = 1, . . . ,Q) has not been expanded yet. The tree-pruning criterion 
for the node s*^*) along with its subtree is compiled from two sets, defined as follows: 
1) The bits in the partial label x^*) corresponding to the node s^*) are compared with the 
corresponding bits in the label of the current MAP hypothesis. All extrinsic metrics A^^^^ 
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with Xi^b = xf]f^^ found in this comparison, may be affected when searching the subtree 



originating from s^*^ As (i(x'^*)) is an intrinsic metric, the extrinsic metrics A^^;^^ need to be 
mapped to intrinsic metrics according to (|28l) . The resulting set of intrinsic metrics, which 
may be affected by an update, is given by 

A(x„.=4r)}. 



2) The extrinsic metrics A^/^^ for j = 1, . . . , z — 1, b = 1, . . . ,Q corresponding to the counter- 
hypotheses in the subtree of s*^*) may be affected as well. Correspondingly, we define 

The intrinsic metrics which may be affected during the search in the subtree originating from 
node s^*) are given by ^(x*^*)) = {ai} = Ai (x*^*)) U A2 (x*^*)) . The node s*^'^ along with its subtree 
is pruned if the corresponding PD (i(x*^*^) satisfies the tree-pruning criterion 

(i(x^*'*) > max a/. 

This tree-pruning criterion ensures that a given node and the entire subtree originating from that 
node are explored only if this could lead to an update of either \^^^ or of at least one of the 



extrinsic metrics Af\f^^. Note that X^^^ does not appear in the set ^(x*^*)), as the update criteria 
given in Section IIII-AI ensure that \^^^ is always smaller than or equal to all intrinsic metrics 
associated with the counter-hypotheses. 

IV. Channel-Matrix Preprocessing 

In this section, we describe how performing the QR-decomposition (QRD) on a column- 
sorted and regularized version of the channel matrix H in combination with compensation of 
self-interference in the LLRs — caused by channel-matrix regularization — carried out directly in 
the tree search can result in a significant complexity reduction at negligible performance loss. 
The use of column-sorting and regularization for soft-output SD was discussed in detail in [7]. 
We shall therefore keep the discussion of the general aspects short and emphasize the aspects 
corresponding to self-interference compensation. 
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A. Column-Sorting and Regularization of the Channel Matrix 

Methods for column- sorting and regularization of the channel matrix H performed on the 
basis of the received symbol vector y have been discussed, e.g., in [27], [28]. Unfortunately, 
such techniques require QRD on symbol-vector rate, which leads to a significant computational 
burden. In contrast, column- sorting and regularization based solely on the channel matrix H 
(and possibly on the noise variance) require QRDs only when the channel state changes, which 
entails a significantly smaller computational burden. 

1 ) Column- sorting: The complexity of SD can be reduced (often significantly) by performing 
the QRD on a column-sorted version of H rather than on H directly, i.e., by computing 
HP = QR, where P is an Mt x Afx permutation matrix. Reduction in terms of complexity 
is obtained if levels closer to the root correspond to main-diagonal entries of R with larger 
magnitude, or equivalently, to spatial streams with higher effective SNR. A corresponding 
computationally efficient heuristic was proposed in [29] and is referred to as sorted QRD (SQRD) 
in the following. 

2) Regularization: A further reduction in terms of complexity — at the cost of slightly reduced 
performance — can be obtained by performing the tree search on a Tikhonov-regularized (and 
column-sorted) version of H according to [30] 



H 




Qa Qc 




R 




P = 










Qb Qd 




OmrxMt 



(31) 



H Q 



where a G M is a suitably chosen regularization parameter. Here, R and Q are partitioned such 
that R, Qa, Qfe, Qc, and Q^ are of dimension Mt x Mt, Mr x Mt, Mt x Mt, Mr x Mr, 
and Mt x Mr, respectively. The computational complexity for regularized SQRD as compared 
to non-regularized SQRD is approximately 50% higher [31]. However, the QRD needs to be 
performed only if the channel matrix H changes, as opposed to the tree-search itself, which 
needs to be carried out at symbol-vector rate. 

LLR computation (and MAP detection) based on regularized SQRD corresponds to replacing 
the modified input-output relation in (H)) by 

y = Rs + fi (32) 
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where y = Q^y, s = P^s, and n = — aQ^s + Q^n. The corresponding intrinsic (max-log) 
LLRs in ^ are obtained by pretending that the resulting noise fi has the same statistics as n, 
which leads to 

L?,b= min 1 - - logP[s] I 

- min ^ - ^^11' - } (33) 

where P[s] = P[s]. The intrinsic LLRs Lf^^ in (|33l) will, in general, only be approximations to 
the true intrinsic LLRs Lf^ in ([5]). This is a consequence of n no longer being i.i.d. circularly 
symmetric complex Gaussian distributed with variance A'o per complex entry, as it contains self- 
interference (i.e., it depends on s) and is, in general, not unitary. Setting a = ^^/N^JW\\s\^, 
where we note that E[|sjp] = IE[|sp] ,Vz, leads to the so-called minimum mean-square error 
(MMSE) SQRD [32] and has been shown in [7] to result in a good performance/complexity 
tradeoff for soft-output STS-SD. In the remainder of this paper, regularization will always refer 
to using MMSE-SQRD. Finally, we note that the LLRs in (l33l) need to be reordered after the 
detection stage to account for the permutation induced by P. 

B. Compensation of Self-Interference 

Using the approximate (max-log) LLRs in (l33l) with a 7^ instead of the exact max-log 
LLRs in ([5]) results in a performance loss. In order to recover part of this performance loss, a 
method for the compensation of self-interference was developed in [33] for list-based MIMO 
detectors. The approach described in [33] can not be applied directly to SISO STS-SD. It turns 
out, however, that compensation of self-interference can be incorporated directly into the tree- 
search procedure. This leads to a noticeable performance improvement compared to using (|33l) . 
while the corresponding increase in complexity is negligible (corresponding simulation results 
are shown in Section IVI-B3I) . 

1) Compensation of self -interference: As shown in [33], the squared Euclidean distance 

1 1 2 T 

y — Hs|| with y = [y^ Oixa/t] can be expanded in two different ways according to 

||y - Hs||^ = ||y - Hs||2 + a^llslp (34) 
||y-Hs||'= ||Q^y-RP^s||'= ||y-Rs||'+||Qfy||' (35) 
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where (|35l) is obtained by using (|3T1) . Equating the RHS terms of (l34l ) and (|351) and using 
||s|p = ||sp yields 

||y - Hsf = ||y - Rsf + ||Qf y|f - a'p||' (36) 

which allows us to conclude that the metric ||y — Hsp contains a contribution that is independent 
of the symbol vectors, namely || Qf^y |p, and a term caused by self-interference given by — «^ || s |p. 

II ~ 1 1 2 

Since we use ||y — Rs|| (instead of the left-hand side of (l36l) ) in the LLR computation (l33l) . the 
two remaining RHS-terms in (l36l) must be compensated. As already observed in Section III-C[ 
constant terms (i.e., terms that are independent of s) cancel out in the LLR computation (flOl) 
and can therefore be neglected without affecting the resulting LLRs, whereas the term — a^pp 
does depend on s and therefore needs to be compensated. This is accomplished by computing 
the self-interference free (SIF) intrinsic max-log LLRs according to [33] 

- min ||y-Rsf - ||sf -logP[s] L (37) 

We emphasize, however, that (|37l) remains an approximation to Q as the noise term n resulting 
from (l32l) is not i.i.d. circularly symmetric Gaussian distributed with variance A'o per complex 
entry. 

2) Compensation in the SISO STS-SD algorithm: In [33] it was suggested to compensate self- 
interference after the tree-search. For the SISO STS-SD algorithm, extrinsic LLRs are computed 



only on the basis of the MAP hypothesis x'^^^, its metric A^^^, and extrinsic metrics A^^^^ 
(see (|25l)). Compensation of self-interference according to (|37] ) after carrying out the SISO STS- 
SD algorithm, would additionally require explicit knowledge of the symbol vectors s G ' , 
which is, in general, not available. Inspection of (l37l) suggests, however, that self-interference 
compensation may be incorporated into the tree-search procedure. Straightforward modification 
of the DIs in (|231) to accomplish this would lead to the modified DIs 



a 



I?; |2 



' ' ' ' No' ' 

which are, however, no longer guaranteed to be strictly non-negative. As in the tightening of the 
tree-pruning criterion described in Section III-Bl we recognize that symbol- vector-independent 



June 4, 2009 



DRAFT 



ffiEE TRANSACTIONS ON INFORMATION THEORY 



18 



terms can be added to the DIs without loss of (max-log) optimality. Therefore, setting the DIs 
to 

\ei\ = \ei\ + m{si) (38) 

with the non-negative term 

m(si) = max|s|^ - (39) 
No V ^GCi J 

leads to the smallest possible non-negative DIs that compensate self-interference directly in the 
tree search. Note that adding non-negative terms to the DIs as done in (l38l) . in general, increases 
the (tree-search) complexity. Recall, however, that channel-matrix regularization itself almost 
always significantly reduces complexity [7], so that this increase, which is shown numerically 
in Section IVI-B3I to be marginal, is tolerable. In addition, it turns out that self-interference 
compensation recovers the performance loss due to channel-matrix regularization to a point where 
near-max-log optimal performance is achieved (see Section rVI-B3l) . In the case of constant- 
modulus symbol alphabets (e.g., BPSK or 4-QAM) we have m(sj) = (i = 1, . . . , Mp) 
and compensation of self-interference in the tree- search does not even increase complexity. 
We conclude by noting that the quantities raaxs^^o l^^iP can be pre-computed and hence, the 
additional computational complexity required to incorporate compensation of self-interference 
into the tree-search procedure is small. 

V. LLR Correction 

The max-log approximation, channel-matrix regularization, and other complexity-reducing 
mechanisms, such as early termination of the tree-search [7], lead to LLRs that are approxi- 
mations to the true LLRs in However, channel decoders (see Fig. [T]) rely on exact LLRs 
to achieve optimum performance. In the following, we present a post-processing method for 
correcting approximate LLRs resulting from sub-optimum detectors. This method is based on 
ideas developed in [34] and [35] and is able to (often significantly) improve the performance in 
(iterative) MIMO decoders while requiring low additional computational complexity. 

A. The Basic Idea 

We start by defining (or recalling the definitions of) the following objects (see Fig. |3]): 
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• the effective channel with the binary-valued inputs Xj 6 and the associated a priori LLRs 
and outputs given by the (possibly approximated) extrinsic LLRs Lf 



• the physical MIMO channel with input s and output y. 



• the soft-input soft-output MIMO detector with inputs y and Lf^ and outputs Lf^. 



• the LLR correction unit (see Fig. |3]) computes corrected extrinsic LLRs Lfj, based on (ap- 
proximated) extrinsic LLRs Lf^ and on side information Z, by applying an LLR correction 
function 




• the side information Z is, for example, obtained from the (instantaneous) receive SNR, the 
singular values of the channel matrix H, and from knowledge of whether the soft-input 
soft-output MIMO detector was terminated prematurely [7]. 
For the LLR correction function to yield valid LLRs, we define 



Just like the LLRs in ^ are computed based on the received vector y and the channel state H, 
the corrected LLRs are computed based on the (approximated) extrinsic LLRs Lf^ and the side 



function depend on other LLR values, besides the one to be corrected, would certainly improve 
the correction performance, but at the same time also dramatically increase the computational 
effort for LLR correction, as will become clear in the discussion of the numerical procedure for 
LLR correction proposed below. 

The main idea is now, depending on the mechanisms used to approximate the extrinsic LLRs 
(e.g., the max-log approximation, channel-matrix regularization, early termination of the tree- 
search), to extract suitable side information Z. To see that this is non-trivial and the problem 
is multi-faceted, simply note that the set of all possible channel matrices H is a continuum of 
Mr X Mt complex-valued matrices. This continuum will be absorbed in Z through, e.g., the 
singular values or the rank of H. We emphasize that in practice, LLR correction requires that 
the set Z be finite. In addition, the individual entries of Z must have finite cardinality as well. 
Hence, continuous-valued quantities, such as, e.g., the SNR or singular values, must be suitably 




(41) 



information Z. The formulation (|40l) and (|4T]) entails that Lf^ depends only on Lf^ (and Z) 
rather than on all extrinsic (approximated) LLR values Lf^ (for all i,h). Making the correction 
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quantized. The total number of different instances of the side information Z is denoted by Z in 
the following. 

B. Computation of the LLR Correction Function 

Once we have chosen Z, the LLR correction function (|4T)) is — in principle — obtained from the 
conditional probabilities P [sj = ±1 1 Lf^, Z~\ . Analytical expressions for correction functions 
seem very hard to obtain (even for simple examples such as for Hagenauer's approximation to 
the box function [34]). We next propose an approach for numerically computing (approximations 
to) the LLR correction function in (|4T]) . 

First, the range of the LLRs to be corrected needs to be constrained (motivated, e.g., by the use 
of LLR clipping) to Lf^ G [— Lmax, +-^max]- This interval is then divided into K equally-sized 
bins such that the A;th bin corresponds to 

Bk= -ivmax + A;^^^, -Lmax + (A: + 1)^^^^ , k = 0, . . . , K - 1. 

Then, the histogram 

p,{Z) = P = +l\Ll,eBk,Z], k = 0,...,K-l (42) 

can be computed by performing Monte-Carlo simulations (averaged over noise and channel 
realizations) with randomly generated bits 5. For each Lf^ and a given instance of Z, the 
(approximated) LLR correction function is obtained by linear interpolation between the base 
points 

We emphasize that for each instance of Z, in general, a different LLR correction function is 
obtained. Note that the LLRs resulting from (|43l) can have a magnitude that is larger than 
-^max (see Section IVLDI) . The corrected LLRs can be clipped again to satisfy \Lf^b\ < i^max,c, 
where Lmax.c > -^max, thereby limiting the dynamic range of LLRs (rather than performing LLR 
clipping for complexity reduction and tuning of the detector as done so far). 

The computational complexity needed to compute the histogram (|42)) and the corresponding 
storage requirements depend critically on the number of bins K and on the total number of 
different instances of the side information Z given by Z. In particular, ZK histogram values 
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need to be stored and hence, it is important to keep both Z and K small. Application of the 
LLR correction function itself amounts to simple table look-up operations followed by linear 
interpolation, which can be performed at very low computational complexity. 

C. An Example 

We next discuss an example that illustrates the impact (and importance) of LLR correction. 
The complexity of the SISO STS-SD algorithm depends critically on the noise realization n, 
the channel-matrix realization H, the transmit-vector s, and the a priori LLRs Lf^. The often 
prohibitively high worst-case complexity of SD [36] constitutes a problem in many practical 
application scenarios, as it inhibits realizing the throughput requirements of many of the available 
communication standards. A promising approach to limiting the worst-case complexity of SD, 
while keeping the resulting performance degradation small, was proposed in [37], [7]. The basic 
idea is to impose an aggregate complexity constraint of A^-Davg visited nodes for a block of 
symbol vectors by using maximum-first (MF) scheduling. This scheduling strategy allocates the 
overall complexity budget according to 

I)max[j]=iVD,,g-^D[£]-(Ar-j)M, j = l,...,Ar (44) 

i=\ 

where M is a parameter to be specified below and D\^\ is the actual number of nodes visited 
in the detection of the £th symbol vector with a corresponding maximum complexity constraint 
of -Dmax[j]j i-C-j the detector is terminated if -Dmaxb] nodes have been visited and the LLRs are 
obtained from the current MAP hypothesis, the associated metric A'^'^^, and the current extrinsic 
metrics Af|/^^. The main idea realized by the policy (l44l) is that detection of the jth symbol 
vector is allowed to use up all of the remaining complexity budget within the block of A^ symbol 
vectors up to (N — j)M nodes, i.e., the parameter M determines that in decoding the remaining 
N — j symbol vectors, we can afford a budget of at least M nodes per symbol vector. Setting 
M = Mt and choosing Davg > (what is used in the remainder of the paper), ensures that 
for each of the remaining N — j symbol vectors at least the hard-output successive interference 
cancellation (SIC) solution is found [7]. For details on early termination and scheduling, we 
refer to [37], [7]. 

Now, if early termination happens before the extrinsic metric Af\f^^ was updated from its initial 
value oo, the corresponding LLR satisfies \Lfb\ = -^max as only LLR clipping according to (l30l) 
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was performed. Hence, early termination may result in LLRs with a higher reliability than they 
would actually have if no complexity constraints were imposed. This calls for LLR correction 
with the goal of reducing the magnitude of such LLRs. Consequently, the side information set Z 
should contain a binary-valued state variable, which indicates whether early termination occurred 
or not. Corresponding numerical results are provided in Section IVLDll 

VL Simulation Results 

Unless explicitly stated otherwise, all simulation results are for a convolutionally encoded 
(rate R = 1/2, generator polynomials [133o ITIq], and constraint length 7) iterative MIMO- 
OFDM system with Mt = Mr = 4, 16-QAM constellation O with Gray labeling, 64 OFDM 
tones, TGn type C channel model [38], and are based on a max-log BCJR channel decoder [39]. 
One frame consists of 1024 randomly interleaved (across space and frequency) bits corresponding 
to one (spatial) OFDM symbol and we assume that the bits Xj (Vi, 6) are statistically indepen- 
dent. The SNR is per receive antenna and the SNR values specified in the figures are in decibels 
(dBs). The number of iterations / is the number of times the soft-input soft-output MIMO 
detector (and the SISO channel decoder) are used, i.e., / = 1 corresponds to soft-output SD 
in [7]. The LLR clipping parameters shown in the simulation results correspond to normalized 
LLR clipping parameters according to Lmax/A'o- 

A. Tightening of the Tree-Pruning Criterion 

1 ) Impact of the Euclidean-distance term: The goal of the simulation results shown in Table H] 
is to quantify the impact of the Euclidean distance term ■^\yi — Tl!j=i ^ij^j^ the bias ([T5l) on 
the complexity reduction obtained by tightening the tree-pruning criterion according to (flTl) . To 
this end, we set the prior term to zero, i.e., logP[s] = 0, and compare the complexity resulting 
from the tightened tree-pruning criterion to that of the standard tree-pruning criterion (denoted by 
"std." in Table HI) given in (O. We observe that the complexity reduction obtained by tightening 
of the tree-pruning criterion based on the Euclidean distance-term only, is marginal, in particular 
in the light of the prohibitive effort required to compute ([T?]) . 

2) Impact of the prior term: Next, we start with uniform priors, i.e., Lf^^ = (ii,h) for the 
first iteration, and perform tightening of the tree-pruning criterion according to ([T9l) . Table |ll] 
shows that removing the bias \pi \ in (fTSi) leads to a dramatic reduction in terms of complexity. 
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ranging from 65.9% to 99.5%. Furthermore, we can see that the impact on complexity reduction 
in the second iteration (/ = 2) is less pronounced (but still significant) than in the first iteration. 
This behavior can be explained by noting that for / = 1 the priors satisfy Lf^^ = 0, which leads 
to the largest possible values for \pi\, i = 1, .. . ,Mt. We note that, in general, the impact on 
complexity reduction is further reduced with increasing /. 

We can now conclude that removing the Euclidean-distance component of the bias term (fTSl) 
is not worth the effort. In contrast, tightening of the tree-pruning criterion based on the prior 
only (fT9l) leads to a significant complexity reduction and requires no additional computational 
complexity if the individual bits Xi^ (Vi, b) are statistically independent (see ((24)) in Section III-CI) . 
In the remainder of this paper, we always employ tightening of the tree-pruning criterion 
according to (fT9l) . 

B. Performance/Complexity Tradeoffs 

The performance/complexity tradeoffs discussed next and quantified in Figs. |4]-[6l [H and [9] 
refer to the cumulative (tree-search) complexity in terms of the total number of nodes visited 
(averaged over independent channel, noise, and data realizations) for SISO detection over / 
iterations, designated as "average complexity" from now on. The computational complexity 
incurred by channel decoding is ignored in the following. The minimum SNR required to achieve 
a given frame error rate (FER) is referred to as the "SNR operating point" for that FER. 

1) Impact of LLR clipping: From Fig. H we can conclude that LLR clipping allows for a 
smooth performance/complexity tradeoff, adjustable through a single parameter, namely the LLR 
clipping parameter Lmax- Note that for a fixed SNR operating point, the minimum complexity is 
not necessarily achieved by maximizing the number of iterations. The performance corresponding 
to the case where clipping of the extrinsic LLRs is performed after the tree search, i.e., LLR 
clipping is not incorporated into the tree search, is that obtained for L^ax = oo. We can therefore 
conclude that incorporating LLR clipping into the tree search is of paramount importance as 
it reduces the complexity substantially and renders the detector easily adjustable in terms of 
performance versus complexity. 

2) Column-sorting and regularization: We next examine the impact of column-sorting and 
regularization of the channel matrix on the performance/complexity tradeoff. It can be seen 
in Fig. [5] that in the low-complexity regime, the Pareto-optimal tradeoff curve is achieved by 
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MMSE-SQRD. In the high-complexity regime, the performance loss incurred by regularization 
renders MMSE-SQRD inferior to un-regularized SQRD. This observation has already been made 
for the soft-output-only case in [7], but is also valid for / > 1 using SISO STS-SD. 

3) Self-interference free LLRs: Fig. [5] also quantifies the impact of compensating self- 
interference — according to Section ITV-B2I — on the performance/complexity tradeoff. We observe 
that compensation of self-interference results in a performance improvement in terms of SNR 
operating point of 0.3 dB to 0.5 dB in almost all regions. In the high-complexity regime un- 
regularized SQRD outperforms channel-matrix regularization and has an SNR operating point 
that is 0.15 dB below that obtained in the SIF case. 

C. Comparison with List Sphere Decoding 

Fig. [6] compares the performance/complexity tradeoff achieved by list sphere decoding (LSD) 
as proposed in [1] to that obtained through SISO STS-SD. For the LSD algorithm, we take 
the complexity to equal the number of nodes visited when building the initial candidate list. 
The (often significant) computational burden incurred by list administration in LSD is neglected, 
leading to a complexity measure that favors the LSD algorithm. We can draw the following 
conclusions from Fig. [6l 

i) SISO STS-SD outperforms LSD for all SNR operating points. 

ii) LSD requires relatively large list sizes and hence a large amount of memory to approach 
(max-log) optimum SISO performance^ The underlying reason is that LSD obtains extrinsic 
LLRs from a candidate list that has been computed around the maximum- likelihood solution, 
i.e., in the absence of a priori information. In contrast, SISO STS-SD requires memory 
mainly for the extrinsic metrics, which are obtained through a search that is concentrated 
around the MAP solution. Consequently, SISO STS-SD tends to require (often significantly) 
less memory than LSD. 

Besides LSD, various other SISO detection algorithms for MIMO systems have been devel- 
oped, see e.g., [2]-[6], [41]. The algorithms described in [3] and [6] are related to LSD but require 
rebuilding the candidate list in each iteration; this can lead to a substantial complexity increase 

^In addition to the memory requirements, the search-and-replace operations required in the LSD algorithm's list administration, 
quickly lead to prohibitively high VLSI implementation complexity when the list size grows [40]. 
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compared to LSD. For [2], [4] issues indicating potentially high computational complexity 
include the requirement for multiple matrix inversions for each symbol vector in each iteration]^ 
In contrast, the QRD required for SD has to be computed only when the channel state changes. 
The computational complexity of the list-sequential (LISS) algorithm in [5], [41] seems difficult 
to relate to the complexity measure employed in this paper. However, due to the need for sorting 
of candidate vectors and the structural similarity of the LISS algorithm to LSD, we expect the 
performance/complexity tradeoff realized by the LISS algorithm to be comparable to that of the 
LSD algorithm. 

D. Impact of LLR Correction 

Fig. |7] shows examples for LLR correction functions of SISO STS-SD obtained by linear 
interpolation using K = 31 bins and side information given by 

Z= {L,,,x,Davg,SNR,T} (45) 

where Lmax = 0.2, D^vg G {16, oo}, SNR = 16 dB, and T G {0,1} indicates whether early 
termination occurred (T = 1) or not (T = 0). Here, the number of instances of Z is given 
hy Z = A. Note that in practice, the parameters Lmax^ -Davg^ and SNR in Z remain constant 
as long as the channel state remains constant, whereas T may change at symbol-vector rate, 
i.e., depending on T, different LLR correction functions need to be applied to the extrinsic 
LLRs Lff^. We compare the LLR correction functions corresponding to SISO STS-SD using 
column- sorting (SQRD), regularization and column-sorting (MMSE-SQRD), and compensation 
of self-interference in combination with MMSE-SQRD, all having unconstrained maximum 
complexity (i.e., Davg = oo and, hence, T = 0). We furthermore show the correction function 
of SIF (MMSE-SQRD) LLRs in combination with MF scheduling for Davg = 16 (denoted by 
"MF16" in Fig. |7]) and T = 1. The following observations can be made: 
i) For unconstrained complexity, i.e., -Davg = co, LLRs corresponding to ±Lmax are corrected 
to LLRs with larger magnitude; this is a result of clipping LLRs with magnitude larger than 
-^max to ±Linax- Wc uotc that siucc the LLR correction functions are obtained by binning 
and linear interpolation, LLR- values that have slightly smaller (mandated by the bin-width) 
magnitude than L^ax are also corrected to values larger than Lmax- 

*A detailed complexity analysis of the algorithm described in [2] based on VLSI implementation results can be found in [12]. 
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ii) For early termination with MF- scheduling (i.e., Davg = 16 and T = 1), LLRs with 
magnitude close to Lmax are corrected to LLRs with smaller magnitude (i.e., their reliability 
is reduced). LLRs corresponding to Lf^ = itLmax are, as already mentioned in Section |V-C[ 
often caused by early termination and hence, are corrected to less reliable LLR-values. 

iii) The LLR correction function associated with column- sorting (SQRD) only is almost a linear 
function with slope one, i.e., Lf^ = Lf^, which indicates that little correction is performed. 
The reason for this behavior is that column-sorting maintains (max-log) optimality and 
the impact of the max-log approximation on performance is small, in general [12]. The 
correction functions associated with channel-matrix regularization show a stronger deviation 
from Lp^ = Lff, (cf. the zoom in Fig. |7]), indicating that more correction is required, since 
regularization leads to an approximation of the max-log LLRs (see Section IIV-A2I) . 

1 ) Performance/complexity tradeoff for SISO STS-SD with early termination: Fig. [8] shows the 
performance/complexity tradeoff for early-termination based on MF scheduling with and without 
LLR correction. The side information was chosen according to (l45l) and the LLR correction 
function was computed based on i^' = 31 bins with linear interpolation. Depending on the 
average run-time constraint, LLR correction can reduce (i.e., improve) the SNR operating point 
by up to 3 dB. As expected, the performance gains resulting from LLR correction are more 
pronounced for larger clipping parameters as in these cases performance is dominated by the 
run-time constraint and early termination happens more often. Note that LLR correction also 
yields slight performance gains for small LLR clipping levels, where the run-time constraints do 
not affect performance. This indicates that LLR correction can also correct — at least partly — the 
errors induced by LLR clipping and by channel-matrix regularization. 

2) Performance/complexity tradeoff for parallel concatenated turbo codes: The next simula- 
tion result is aimed at understanding which of the conclusions drawn so far change in the presence 
of more sophisticated channel codes. To this end, we evaluated the performance/complexity 
tradeoff for a parallel-concatenated turbo code (PCTC) of rate 1/2 (punctured, memory 2, and 
generator polynomial [7o 5o], where 7o pertains to the feedback path) with eight iterations in the 
turbo decoder. We use the interleaver specified in the 3GPP standard [42] with 508 information 
bits. One code-block corresponds to 1024 coded bits including two times four bits for termination 
of the trellises. For aggressive LLR clipping, simulation results have shown that using the sum- 
product algorithm within the turbo decoder requires precise (and hence, corrected) LLRs to yield 
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satisfactory results, whereas max-log-based decoders seem to be more robust to effects incurred 
by LLR clipping [12]. Since we employ the sum-product BCJR algorithm [39] for decoding of 
the PCTCs, LLR correction is used. 

The results in Fig. |9] indicate that the performance/complexity tradeoff achieved by the PCTC 
in the first iteration is significantly better than that obtained for the convolutional code (CC) 
used in the previous simulations. In the second iteration, the performance/complexity tradeoff is 
almost identical for both codes. For / > 2, the CC slightly outperforms the PCTC, which could 
be due to the fact that we use a turbo code with very short block length and a channel model 
that exhibits correlation across frequency and space (see, e.g., [43]). 

E. Information Transfer Characteristics 

In order to characterize the performance of soft-input soft-output MIMO detectors indepen- 
dently of the channel code and channel decoder, we compute information transfer characteristics 
(ICTs) using an i.i.d. (across space and OFDM tones) Rayleigh multi-path fading channel model 
and assuming a Gaussian model for the a priori LLRs according to [44] 

where n is a real- valued Gaussian RV with zero mean and variance cr^. The a priori information 
content is determined by cr^ and characterized by the mutual information between the transmitted 
bits Xif, and the a priori input of the SISO detector, i.e., /a = l{xi^\,\L^^ (in bits per binary 
symbol) where < /a < 1. Note that large and small values of cr^, reduce and increase 
the mutual information /a, respectively. The extrinsic information at the output of the detector 
(averaged over all transmit antennas and bits) is defined as 

i=\ h=\ 

in bits per binary symbol where < Je < 1 . Note that = implies /a = and corresponds 
to soft-output-only MIMO detection. The ITC corresponds to the function Je = /^(/a), for a 
given SNR, and enables us to assess the performance of soft-input soft-output MIMO detectors in 
a fundamental way. Note that the application of ITCs as described here was originally proposed 
in [45, Chapter 16]. 
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1) Impact of LLR clipping: Fig. [TO] shows that a normalized LLR clipping parameter of 
Lma.x = 0.4 achieves almost the same ITC as max-log optimal SISO STS-SD with Lmax = oo. 
Hence, increasing the LLR clipping parameter to a value above 0.4 does not further improve 
performance of the detector and only leads to an increase in complexity. We note that the same 
observation was made in the performance/complexity tradeoff simulations in Fig. IH 

2) Performance comparison with LSD: Fig. [H] compares the ITC of SISO STS-SD to that 
of LSD [1]. For /a close to 1, LSD requires large list-sizes to yield a performance close to that 
of the max-log-optimal SISO STS-SD algorithm. Note that even hard-output MAP detection 
(which corresponds to SISO STS-SD with Lmax = 0) can outperform LSD — in terms of ITCs — 
if /a is close to 1 and the list-size is small. We can therefore conclude that SISO STS-SD has 
a fundamental performance advantage over LSD, which is in agreement with the observations 
made in Section IVI-CI 

F. Approaching Outage-Capacity with SISO STS-SD 

We finally compare the performance obtained with SISO STS-SD to outage capacity using 
the TGn type C channel model [38]. To this end, we define the £:-outage capacity Cout,e as [46], 



where 7^ = {H[l], . . . , H[A^]} contains the Mr x Mt channel matrices for the = 64 OFDM 
tones and [48] 



The FER is lower-bounded by the outage probability (|46l) according to [49] 

P[/(SNR,H) < RMtQ] < FER(SNR) 

where the information rate per OFDM tone is given by RMtQ. The performance comparison 
consists of setting the outage probability and the FER to 1% and identifying the corresponding 
SNR operating points. Fig. [T2l shows the corresponding results for SISO STS-SD with different 
modulation schemes for 1=1 and 1 = 8. Note that the LLR clipping parameters are chosen so 
as to minimize complexity while retaining near-max-log optimal performance at 1% FER (i.e., 
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we used L^^x = 0.1, L^^ = 0.4, L^^ = 2.0, and L^^ = 6.0 for 64-QAM, 16-QAM, QPSK, 
and BPSK, respectively). We can see that SISO STS-SD operates between 1.5 dB (for 4-QAM) 
and 5.3 dB SNR (for 64-QAM) away from outage capacity. 

VII. Conclusion 

We proposed a soft-input soft-output MIMO detector based on single tree-search sphere 
decoding (STS-SD) as introduced in [7], [8]. Besides adapting the single-tree search paradigm to 
account for soft-inputs, key to obtaining low complexity of the proposed algorithm are tightening 
of the tree-pruning criterion, clipping of the extrinsic LLRs built into the tree search, and a novel 
method for incorporating compensation of self-interference in LLRs — caused by channel-matrix 
regularization — into the tree search. Finally, we proposed an LLR correction method, which was 
demonstrated to achieve substantial performance improvements at low additional computational 
complexity. Our simulation results showed that the SISO STS-SD algorithm offers a wide range 
of performance/complexity tradeoffs, clearly outperforms state-of-the-art SISO detectors for 
MIMO systems, and achieves close-to-optimal — in the sense of outage capacity — performance. 
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Fig. 1. Iterative MIMO decoder. SISO STS-SD (corresponding to tiie dashed box) directly computes extrinsic LLRs. 
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Fig. 3. LLR correction post-processes the LLRs resulting from tlie effective cliannel using side information Z. 
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Fig. 4. Performance/complexity tradeoff of SISO STS-SD with SQRD. The numbers next to the curves correspond to normalized 
LLR clipping parameters. 
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Fig. 5. Performance/complexity tradeoff of SISO STS-SD with SQRD, MMSE-SQRD, and SIF MMSE-SQRD. The numbers 
next to the curves correspond to normalized LLR clipping parameters. 
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Fig. 6. Performance/complexity tradeoff of LSD [1] and SISO STS-SD, both using SQRD. The numbers next to the curves 
correspond to the list size for LSD and to normalized LLR clipping parameters for SISO STS-SD. 
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Fig. 8. Impact of LLR correction. The solid lines correspond to the performance obtained with LLR correction, whereas the 
dotted lines pertain to un-corrected LLRs. Both variants employ early termination with MF scheduling and compensation of 
self-interference in the LLRs in combination with MMSE-SQRD. 
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Fig. 9. Performance/complexity tradeoff of SISO STS-SD with SQRD (without regularization). Comparison between parallel- 
concatenated turbo codes (PCTCs) and convolutional codes (CCs). 



June 4, 2009 



DRAFT 



FIGURES 



41 



1 




Ql 1 1 1 1 1 

0.2 0.4 0.6 0.8 1 

/a [bits per binary symbol] 



Fig. 10. ITC of SISO STS-SD at SNR = 12 dB for different (normalized) LLR clipping pai'ameters. 
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Fig. 11. ITC of LSD compared to hai'd-output MAP and (max-log) optimal SISO STS-SD performance at SNR = 12 dB. 
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TABLE I 

Average complexity reduction obtained by tightening of the tree-pruning criterion based on the 

Euclidean-distance term only 



SNR 




std. [nodes] 


tight [nodes] 


reduction 


10 dB 


0.0125 

oo 


34.9 
328.3 


34.4 
327.8 


1.4% 
0.2% 


20 dB 


0.0125 
oo 


11.0 
227.2 


10.8 
227.0 


1.8% 
0.1% 
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TABLE n 

Average complexity reduction obtained by tightening of the tree-pruning criterion based on the prior 

term only 



SNR 


1 




sld. [nodcsj 


lighl 1 nodes] 


reduction 


10 dB 


1 


0.0125 

oo 


1890.4 
2440.2 


34.9 
328.3 


98.2% 
86.5% 


2 


0.0125 

00 


1630.6 
2148.4 


43.4 
406.6 


97.3% 
81.1% 


20 dB 


1 


0.0125 

oo 


1914.7 

2397.0 


11.0 

227.2 


99.4% 
90.5% 


2 


0.0125 
oo 


1228.7 
361.9 


6.2 
132.4 


99.5% 
65.9% 
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